Expert Systems with Applications — Latest Matching Preprints

1

A Digital Twin for Tracking and Forecasting Glycemia with Septic Patients in ICUs

Cao, X.; Wei, X.; Hou, J.; cai, c.; Wang, Q.

2026-05-04 endocrinology 10.64898/2026.04.24.26351177 medRxiv

Top 0.1%

3.1%

Show abstract

We present a digital twin framework for real-time glucose monitoring and forecasting in septic patients in intensive care units (ICUs). The framework combines advanced machine learning models trained on continuous glucose measurements with a dynamic transfer-learning workflow that enables rapid deployment to individual patients and supports personalized, adaptive, and predictive clinical decision-making. Built on a foundation model--a pretrained time-series transformer--the digital twin continuously updates its parameters as new patient data arrive and produces rolling near-term forecasts in real time. To assess adaptability and computational efficiency, we deployed the pretrained model to ten septic patients and evaluated multiple retraining strategies, including zero-shot inference, linear probing, and full and staged fine-tuning. Results show that the model can be initialized and personalized for a new patient within seconds on a standard laptop while achieving accurate glucose forecasts under varying data conditions. These findings demonstrate the feasibility of real-time model personalization in resource-constrained, high-acuity environments and highlight the potential of digital twins as scalable, AI-enabled platforms for continuous physiological monitoring, clinical decision support, and individualized treatment design in the ICU.

2

Failure detection in medical image classification under realistic distribution shifts: A large-scale benchmark

Steinmetz, P.; Frouin, F.; Morard, V.; Buvat, I.

2026-05-05 radiology and imaging 10.64898/2026.05.04.26350496 medRxiv

Top 0.1%

2.6%

Show abstract

Medical images (MI) exhibit variability due to different acquisition protocols, devices, and patient populations, making failure detection at inference time essential for reliable deployment of clinical classifiers. As existing evaluations of failure detection methods use different settings, it is difficult to compare results and identify the best strategy, if any. We present a comprehensive benchmark of eight confidence scoring functions and two score-aggregation strategies across eight MI tasks spanning diverse modalities, backbone architectures, training setups, and failure sources. The confidence ranking ability and classification error mitigation are jointly evaluated. While no single method systematically dominated across settings, aggregation of confidence scores consistently matched or approached the best individual method and substantially reduced silent failure rate. The failure detection performance was strongly correlated with classifier accuracy for all tested settings. These findings provide large-scale evidence regarding the strengths and limitations of confidence scoring strategies and offer actionable guidance for mitigating silent failures under realistic distribution shifts in MI.

3

A Consensus-Driven Stacking Ensemble Framework for Interpretable Cardiovascular Risk Prediction and Clinical Deployment

Sozol, S. S.; Dev Nath, B. C.; Fahim, F. M. S.; Suzana, N. N.; Mirza, J. F.; Ahmmed, S.; Zohra, F.-T.; Zafr, A. H. A.; Uddin, M. N.; Mondal, M. R. H.; Hoque, A. S. M. L.

2026-05-26 health informatics 10.64898/2026.05.18.26352989 medRxiv

Top 0.1%

1.9%

Show abstract

Machine learning (ML) is being considered to help diagnose cardiovascular diseases (CVD). Still, challenges like inconsistent and limited datasets, limited infrastructure, and global inequalities lead to the need for a reliable and practicable ML solution. This paper presents an ML-driven framework for predicting CVD risk scores and classifying status. Several data preprocessing techniques, including multiple imputation by chained equations (MICE), outlier removal, are considered. In addition, hyperparameter tuning is performed with the GridSearchCV tuning technique. Moreover, a consensus-driven five-feature selection method is applied to identify optimal predictors. The dataset used in this study contains healthcare records related to future CVD risk scores, comprising 1,529 patient records with 22 features. The optimized stacked ensemble model is applied to the dataset and achieves a cross-validated coefficient of determination value of 98.13% for CVD risk score regression. Comparative evaluation with other ML models confirmed improved accuracy, efficiency, and interpretability. The explainable AI technique SHAP is applied to interpret predictions and highlight key risk factors. Moreover, a deployment-ready web platform with multi-role access has been developed that demonstrates clinical applicability. The proposed framework offers a reliable and interpretable tool for early detection of CVD and personalized risk assessment. In the future, this work can be extended to integrate longitudinal data, medical imaging, and deep learning to improve generalizability and strengthen real-world impact.

4

Automatic Bevacizumab Response Prediction in Ovarian Cancer from Digital Pathology Images via Novel AI-based Computational Pipeline

Alsaiari, A.; Turki, T.; Taguchi, Y.-h.

2026-05-04 bioinformatics 10.64898/2026.04.29.721782 medRxiv

Top 0.1%

1.8%

Show abstract

Ovarian cancer is one of the gynecological cancer types, which, if metastasized and not detected early, can cause deaths among women. Therefore, there is a need to accurately predict drug responses to ovarian cancer. A gynecological pathologist inspects abnormality in tissues, followed by providing a report about patients; however, such a diagnostic process is (1) hard; (2) requires experience; and (3) time consuming. Moreover, existing tools are far from perfect. Hence, we present a computational pipeline to improve predicting drug response pertaining to ovarian cancer, derived as follows. First, we download digital pathology images pertaining to ovarian bevacizumab response from the cancer imaging archive repository. We employed histogram of oriented gradients to images, constructing feature vectors, provided to Fisher linear discriminant analysis to change the representation through dimensionality reduction. Then, we provide reduced-dimensionality data for regression analysis through support vector regression coupled with various kernels and calculating the area under the ROC curve (AUC). Experimental results against transformer-based models (ViT and Swin) and other deep learning (DL) models (VGG16, ResNet50, InceptionV3, MobileNetV2, and EfficientNetB6) demonstrate that our approach with radial kernel (named SVRD+R) yielded an AUC performance improvements of 17% against the best-performing transformer-based model (ViT) while obtaining an AUC performance improvements of 14.9% when compared against the best DL-based model (MobileNetV2). These results demonstrate the superiority and feasibility of our AI-based pipeline when tackling prediction problems pertaining to gynecologic cancer studies. MSC92B05; 68T09

5

Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions

Lin, G.; Miao, R.; Sacheck, J.; Zhang, X.

2026-05-21 public and global health 10.64898/2026.05.18.26353525 medRxiv

Top 0.1%

1.7%

Show abstract

Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized optimal distribution of daily steps over a period of time for the best of certain health biomarkers. In this paper, we fill this void based on the data from the All of Us Research Program which includes months of step counts as well as repeated measurements of key health biomarkers. We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions associated with cardiometabolic risk, where the action is a function representing the daily step distribution over a period of time. Simulation studies demonstrate the advantage of the proposed approach over existing continuous-action RL methods. The learned optimal policy from the All of Us data generally suggests people take more daily steps and also follow a more consistent pattern of PA over time while offering tailored recommendations for subgroups in blood glucose level, body mass index, blood pressure, age, and sex.

6

Demographic trends and forecasts of alcohol-associated liver disease in the United States, 2008-2030

Viguerie, A.; Iacomini, E.; D'Orsogna, M. R.

2026-05-13 public and global health 10.64898/2026.05.09.26352799 medRxiv

Top 0.2%

1.4%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAlcohol-associated liver disease (ALD) has been steadily increasing in the United States for many years, as attested by increases in ALD deaths and liver transplant demand. Direct measurement of ALD incidence is challenging as diagnosis often occurs late (or not at all). This study employs a demographically-aware backcalculation method, based on mortality data, to reconstruct latent, age-structured ALD risk and incidence trends in the US population from 2008 to 2022 and uses this information to forecast future ALD trends through 2030. We find that ALD incidence has risen steadily since 2008, with a sharp increase during the 2020 COVID-19 pandemic, and that the average age at onset has also increased over time, with demographic factors playing a substantial role. While our forecasts suggest a continuation of the pre-2020 growth in ALD incidence for most age and sex groups, we also predict marked increases among younger men, a generational shift toward older age cohorts, and substantial rises among older females. Most concerning, between 2022 and 2030, incidence is expected to double among younger men and older females and by 2030 the number of new male ALD cases is projected to be more than twice that of females for all age groups. Our results provide a clearer understanding of evolving ALD trends, highlighting the role of demographic and birth cohort effects. We underscore the urgent need for targeted interventions, particularly among younger men, to reduce ALD-related behaviors and future burden.

7

Optimizing Screening for Intrauterine Fetal Growth Restriction in Low-Resource Settings Using 2D Ultrasound: A Deep Learning Approach

Enywaku, A.; Asiku, R. A.

2026-05-05 radiology and imaging 10.64898/2026.05.04.26352354 medRxiv

Top 0.2%

1.4%

Show abstract

Severe fetal growth restriction (sFGR) affects 5 to 10% of pregnancies worldwide and is a major contributor to perinatal morbidity and mortality, particularly in low- and middle-income countries (LMICs). Traditional 2D ultrasound detection methods suffer from operator dependency, gestational age uncertainty, and limited access to Doppler in many low-resource facilities. This study presents a deep learning framework for sFGR screening and triage using 2D fetal abdominal ultrasound images designed to operate independently of precise gestational dating. Growth restriction severity labels were derived by mapping abdominal circumference measurements to INTERGROWTH-21st term percentiles as a gestational-age-normalized proxy for fetal size restriction when case-level gestational age or birth-weight data are unavailable. A systematic literature review of 37 studies revealed gaps in severity stratification and generalizability. We implemented a DenseNet-121-based model with abdominal circumference measurement for severity-aware classification using a retrospective single-center dataset of 1588 annotated fetal abdominal images from 169 term pregnancies. Patient-wise 3-fold cross-validation and ensemble testing yielded 93.7% accuracy, a weighted F1-score of 0.76, and ROC AUC [≥] 0.98 per class on heldout data. The approach outperforms previously reported single-center methods on this dataset while explicitly targeting LMIC-specific constraints. It demonstrates potential as a gestational-age-independent first-line triage layer for equitable prenatal screening, subject to prospective multi-site validation.

8

Pharmacological inhibition of deubiquitinase UCH-L1 by LDN57444 sensitises hepatocellular carcinoma to sorafenib by reverting drug-induced adaptive responses

Van De Vijver, E.; Decroix, K.; Burggraeve, D.; Van Wassenhove, P.; De Vos, Z.; Ampe, C.; Devisscher, L.; Van Vlierberghe, H.; Van Troys, M.

2026-05-19 cancer biology 10.64898/2026.05.15.725527 medRxiv

Top 0.2%

1.2%

Show abstract

Background and aimsTherapeutic outcomes for advanced hepatocellular carcinoma remain inadequate, despite recent advances using immunotherapy. Long-term effectiveness of systemic therapies, including second-line multi-tyrosine kinase inhibitor sorafenib, is limited by resistance mechanisms and adverse effects. Upregulated deubiquitinase UCH-L1 is frequently correlated with poor prognosis in cancers. Here, we investigated the therapeutic potential of combining pharmacological UCH-L1-inhibition with sorafenib in HCC. MethodsUCH-L1 expression was analysed in TCGA-LIHC data and patient-derived HCC tissues. Sorafenib and LDN57444 effects were evaluated in vitro in cytotoxicity and invasion assays. Gene and protein expression were examined by RT-qPCR, Western blotting and immunohistochemistry. In vivo efficacy of drug synergy was assessed in an orthotopic xenograft mouse HCC model. ResultsIn silico data-analysis revealed significantly higher UCH-L1 levels in patient HCC tumours versus non-tumour, associated with reduced overall survival. Low-dose sorafenib upregulated UCH-L1 in HCC cell line Hep3B. Paradoxically, this also promoted invasiveness and sustained MEK1/2-ERK1/2-pathway activation. Combining low-dose sorafenib with LDN57444 produced strong synergistic cytotoxicity in vitro, reverted MAPK-activation and suppressed invasion. Consistently, at low sorafenib dose co-treatment with LDN57444 completely inhibited tumour growth of Hep3B xenografts and enhanced sorafenib efficacy. ConclusionLDN57444 sensitises HCC cells to low-dose sorafenib by reverting drug-induced pro-oncogenic signalling and thereby strongly synergises with sorafenib to enhance anti-tumour efficacy in a HCC mouse model. This presents UCH-L1 as a player in treatment-induced adaptive response and supports further exploring UCH-L1-targeting in combination with sorafenib as therapeutic avenue for advanced HCC. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=144 SRC="FIGDIR/small/725527v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): org.highwire.dtl.DTLVardef@176dc91org.highwire.dtl.DTLVardef@8acae8org.highwire.dtl.DTLVardef@f71bborg.highwire.dtl.DTLVardef@1f3c5aa_HPS_FORMAT_FIGEXP M_FIG C_FIG Lay summaryThis study explores a new treatment approach for hepatocellular carcinoma (HCC) by combining two drugs: LDN57444, which blocks the enzyme UCH-L1, and sorafenib, a FDA-approved multi-tyrosine kinase inhibitor. We evaluated the effect of this drug combination in vitro using a HCC cell line and in an mouse HCC-model. The drug combination displayed strong, synergy in lowering HCC cell viability, and greatly reduced invasiveness and in vivo tumour growth. LDN57444 sensitised HCC cells to low doses of sorafenib by preventing UCH-L1-mediated activation of pro-oncogenic signalling. These findings highlight the potential of this new drug combination for treating advanced HCC thereby potentially reducing side-effects and countering drug resistance. Impact and implicationsOur preclinical research introduces a novel combination strategy against advanced HCC that holds potential to improve existing therapies, particularly the second-line multi-tyrosine kinase inhibitor sorafenib. The proposed combination of sorafenib with an inhibitor of the deubiquitinase UCH-L1 not only enhances sorafenib efficacy but present promise to also counter resistance mechanisms. Moreover, because effective responses are achieved at lower drug doses, this may in addition reduce therapy-associated adverse effects further increasing potential impact. While sorafenib is FDA-approved, the UCH-L1 inhibitor LDN57444 needs further (clinical) development to bring our promising findings to full translational potential for HCC patients and physicians.

9

DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv

Top 0.2%

1.2%

Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

10

Data Assimilation Substitutes for Biological Complexity in Hybrid Influenza Forecasting Models

Alleman, T. W.; Van Wesemael, T.; Shanker, N.; Mietchen, M. S.; Loo, S.; Ajagbe, S. O.; Baetens, J. M.; Lemaitre, J.; Hill, A. L.; Truelove, S. A.; Bento, A. I.

2026-05-27 public and global health 10.64898/2026.05.19.26353597 medRxiv

Top 0.2%

1.2%

Show abstract

Hybrid mechanistic-statistical models offer interpretability and adaptability for short-term seasonal epidemic forecasting, but it remains unclear whether their accuracy depends more on increased biological complexity or on the assimilation of richer data. Using eight retrospective influenza seasons in North Carolina, we evaluate whether training on historical data and assimilating auxiliary emergency department (ED) visit data improves four-week-ahead hospital admission forecasts more than adding biological complexity (multi-subtype structure and cross-season immunity). Hierarchical Bayesian training on historical data improves accuracy by 22.4 % (95 % CI: 16.4-28.1 %), and inclusion of ED visit data yields a further 5.3 % (95 % CI: 3.0-7.6 %) improvement, whereas added biological complexity produces diminishing or null gains. We further observe a substitution effect in which ED visit data partially compensates for omitted biological structure. We deployed a simplified model variant in the 2025-2026 CDC FluSight Challenge and ranked among the top ensemble performers, supporting the robustness of Bayesian hierarchical training in real time. Together, these findings indicate that short-term forecast accuracy is driven more by historical learning and assimilating auxiliary signals than by biological fidelity, with implications for how forecasting systems should balance mechanistic complexity.

11

Multi-Agent AI for Chest Radiography: A Sequential Segmentation and LLM-Driven Consultative Tool for Medical Training

Kurt, F.; Subasi, A.

2026-06-01 health informatics 10.64898/2026.05.29.26354432 medRxiv

Top 0.2%

1.2%

Show abstract

Background: Traditional diagnostic models lack explainability, while multimodal language models prone to hallucination remain unsafe for medical education. An interactive, risk-free artificial intelligence framework is required to serve as a reliable clinical mentor for radiology trainees. Methods: We propose a multi-agent architecture decoupling deterministic image analysis from generative consultation. Specialized computer vision models perform anatomical localization and pathological segmentation. These quantitative outputs are synthesized into a structured payload, which grounds a locally hosted large language model (LLaVA 7B) using strict prompt guardrails and prerequisite protocols. Results: The system effectively eliminates visual hallucinations by intercepting unanchored queries. The artificial intelligence tutor successfully contextualizes spatial anomalies and baseline metrics, generating accurate conversational explanations and formally structured radiology reports while strictly enforcing medical safety disclaimers. Discussion and Conclusion: By anchoring language generation exclusively to verified algorithmic realities, this framework transforms opaque diagnostic models into safe, interactive educational simulators. This establishes a highly reliable paradigm for integrating explainable artificial intelligence into medical training.

12

An Efficient and Interpretable Learning Approach for Large-Scale Histopathology Data

Moore, C.; Gupta, V.; Neupane, S.; Tripathi, H.

2026-05-03 health informatics 10.64898/2026.04.30.26352196 medRxiv

Top 0.3%

1.0%

Show abstract

Prostate cancer (PCa) remains one of the leading causes of cancer-related mortality among men, and histopathological analysis of prostate biopsy specimens is central to diagnosis and risk stratification. Whole-slide Images (WSIs) capture rich morphological information, but their gigapixel scale and the large number of extracted tissue patches make exhaustive annotation and model training computationally expensive. Attention-based Multiple Instance Learning (MIL) has emerged as an effective weakly supervised framework for WSI analysis, enabling slide-level prediction without requiring patch-level annotations. However, training MIL models on large histopathology cohorts remains resource intensive because many extracted patches are non-informative, and some patches are often processed repeatedly during training. To address these challenges, we propose an efficient and interpretable learning framework for large-scale histopathology analysis. Our method combines a pathology-pretrained UNI encoder, a Clustering-constrained Attention Multiple instance learning-Single Branch (CLAM-SB) attention-based MIL model, and a window-based training strategy that reduces computational overhead while preserving predictive performance. The paper illustrates our proposed approach and experiments on TCGA-PRAD WSIs for the PCa patients. Processing 189,600 sampled patches across 79 WSIs with our proposed approach reduced total training time by 57.5% (20 to 8.5 hours for 5 epochs) and 41.4% (27 to 16 hours for 10 epochs), respectively, underscoring its potential as a practical and resource-efficient strategy for scalable prostate histopathology analysis.

13

An Interpretable Multimodal Framework for Student Mental Health Risk Assessment Using Temporal Embeddings and Fuzzy Inference

Shah, A.; Mehta, A.; Bhensdadia, C. K.

2026-05-20 health informatics 10.64898/2026.05.16.26352630 medRxiv

Top 0.3%

0.9%

Show abstract

Mental health challenges among university students have increased due to academic pressure, lifestyle changes, and continuous digital engagement. Existing approaches for mental health assessment often rely either on self-reported psychological scales or isolated behavioral indicators, limiting their ability to capture complex temporal and contextual patterns. This study proposes an interpretable multimodal framework for student mental health risk assessment using behavioral sensing, academic information, ecological momentary assessments (EMA), and psychometric survey data. A bidirectional Long Short-Term Memory autoencoder is employed to learn latent temporal representations from day-level behavioral sequences, while graph embeddings capture structural relationships among students using similarity-based neighborhood graphs. These representations are fused with academic and survey-derived features and reduced using Principal Component Analysis and Uniform Manifold Approximation and Projection. K-means clustering is then applied to identify behaviorally distinct student groups. Experimental analysis on the StudentLife dataset demonstrates meaningful clustering performance with a Silhouette Score of 0.4209 and Adjusted Rand Index stability of 0.6869. The identified clusters correspond to low-risk, moderate-risk, and high-risk behavioral profiles. To improve interpretability and practical usability, a fuzzy inference system is introduced to compute mental risk, academic risk, and wellbeing indices using psychometric indicators including PHQ-9, PSS, PANAS, VR-12, and Big Five personality traits. The results demonstrate the potential of combining multimodal behavioral modeling with interpretable fuzzy reasoning to support early mental health risk assessment in educational settings.

14

Automated identification of bolus types in modified barium swallow studies using deep learning: a preliminary study

Mao, S.; Sahli, A. J.; Buoy, S. N.; Hutcheson, C.; Gelabert, G. A.; Barbon, C. E. A.; Naser, M. A.; Fuller, C. D.; Brock, K. K.; Hutcheson, K. A.

2026-05-20 radiology and imaging 10.64898/2026.05.16.26353385 medRxiv

Top 0.4%

0.8%

Show abstract

Purpose: Modified Barium Swallow (MBS) studies utilize videofluoroscopy, a dynamic X-ray technique for evaluating swallowing anatomy and physiology. Each MBS exam typically includes multiple bolus trials, often involving different bolus consistencies. Accurate classification of bolus types is essential, as swallowing dynamics, aspiration risks, and residue levels vary with bolus consistency. In this preliminary study, we propose a deep learning-based approach for automated bolus type classification in MBS, aiming to provide a standardized and efficient framework for automated processing of swallowing assessments. Methods: A total of 206 patients (Mean +/- SD age: 60.24 +/- 9.02 years; 89.32% men) underwent MBS examinations, comprising 277 individual MBS studies. The dataset included 2,752 bolus-level video segments, categorized by bolus type as follows: 1,711 liquid (IDDSI 0-3, 62.17%), 521 pudding (IDDSI 4, 18.93%), and 520 solid boluses (IDDSI 7, cookie or cracker, 18.89%). To standardize variable video lengths for the data pipeline, each MBS video was temporally segmented into a fixed-length frame sequence, with shorter videos padded using static frames and longer videos randomly cropped to the target length. We employed an Inflated 3D convolutional neural network to develop the deep learning model. Results: Each video segment contained an average of 273.03 +/- 195.81 frames. On the independent test set, the deep learning model achieved an overall accuracy of 96.13%, and the macro F1-score was 95.05% in classifying food bolus types within MBS videos. Conclusions: The developed AI-based system demonstrated effective automated classification of food bolus types in MBS videos, representing an important step toward fully automated MBS analysis for swallowing efficiency assessment. The AI model reduces the reliance on manual labels, thereby promising to streamline clinical and research workflows.

15

TopBrain Segmentation Challenge for Whole Brain Vessel Anatomy

Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;

2026-05-30 radiology and imaging 10.64898/2026.05.28.26354312 medRxiv

Top 0.4%

0.7%

Show abstract

We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.

16

Development of a Deep Learning Model Integrating CT Images and Blood Data for the Diagnosis of Acute Cholecystitis

HORAGUCHI, T.; Nomura, R.; Sakai, S. A.; Saito, N.; Kurihara, K.; Ohira, M.; Takaha, R.; Mitsui, N.; Yokoi, R.; Hatanaka, Y.; Hayashi, H.; Kuno, M.; Fukada, M.; Sato, Y.; Yasufuku, I.; Asai, R.; Bando, H.; Yamashita, R.; Matsuhashi, N.

2026-05-12 radiology and imaging 10.64898/2026.05.08.26352724 medRxiv

Top 0.5%

0.7%

Show abstract

PurposeIn this study, we aimed to develop and evaluate an artificial intelligence-based diagnostic model for the diagnosis of acute cholecystitis (AC) using non-contrast CT images and clinical data. Materials and MethodsThis retrospective study included 199 patients (100 AC, 99 non-AC) treated between January 2016 and December 2025 at a single center. Patients were randomly divided into training (n=139) and test (n=60) datasets. Three models were constructed: an imaging-based deep learning model, a clinical data-based machine learning model, and a hybrid machine learning model integrating deep learning-derived imaging features with clinical data. CT images were preprocessed, and gallbladder regions were segmented. Clinical variables included white blood cell counts and levels of C-reactive protein and liver function markers. Model performance was evaluated using accuracy, precision, recall, specificity, F1 score, and area under the receiver operating characteristic curve (AUC). Statistical comparisons were performed using Welchs t-test and Chi-square test. ResultsThe imaging-based model achieved accuracy 0.883, precision 0.848, recall 0.933, specificity 0.833, and AUC 0.916. The blood-based model achieved accuracy 0.917, precision 0.931, recall 0.900, specificity 0.933, and AUC 0.949. The hybrid model showed the highest performance, with accuracy 0.950, precision 0.909, recall 1.000, specificity 0.900, F1 score 0.952, and AUC 0.986. ConclusionA hybrid model integrating CT imaging and clinical data improved diagnostic performance for AC compared with single-modality models.

17

From Power Spectral Density to Wavelets: Improving Symbolic Representations of Electroencephalography Band Dynamics in the Weed Plot Framework

Meinardi, V.; Boyallian, C.; Giuzio, R.

2026-05-06 neurology 10.64898/2026.05.05.26352441 medRxiv

Top 0.5%

0.7%

Show abstract

Electroencephalography (EEG) interpretation in clinical practice relies on the analysis of energy distribution across standard frequency bands. The Weed Plot framework encodes band-wise spectral energy, computed using Fourier-based methods, into a symbolic representation that preserves the interpretability of traditional EEG analysis. In this study, we propose a wavelet-based extension of this framework, where the energy of predefined clinical EEG bands is estimated using the Discrete Wavelet Transform instead of Power Spectral Density. Unlike Fourier-based approaches, wavelets provide a time-frequency representation that captures transient and non-stationary dynamics while remaining consistent with clinically defined bands. From these estimates, symbolic patterns are constructed based on the relative ordering of frequency bands within short temporal windows. Their empirical distribution is used to extract entropy-based features for epilepsy detection using multiple machine learning classifiers. From an Artificial Intelligence perspective, the main contribution is a structured symbolic encoding that enhances feature discriminability. From an engineering perspective, the contribution lies in an automated framework for EEG-based epilepsy detection. Experimental results show that wavelet-based representations improve classification performance compared to raw entropy and Fourier-based features. This improvement arises from the interaction between time-frequency localization and symbolic encoding, producing more discriminative feature distributions. These findings support wavelet-based symbolic representations as a robust and interpretable framework for EEG analysis, bridging clinical interpretation and data-driven methods.

18

A Prospective Observational Study on a Multimodal Non-Invasive Physiological Monitoring System (Hayl): Feasibility, Signal Characterization, and Exploratory Biomarker Correlation

Choda, G.; Choda, A.

2026-05-17 endocrinology 10.64898/2026.05.13.26353115 medRxiv

Top 0.5%

0.7%

Show abstract

Chronic conditions such as Type 2 Diabetes Mellitus (T2DM) and Hypertension (HTN) remain underdiagnosed in community settings, particularly in resource-limited populations. Conventional diagnostic approaches rely on episodic measurements and laboratory-based assessments, limiting scalability for large-scale screening. Non-invasive physiological monitoring systems offer a potential pathway for accessible and rapid wellness assessment in real-world environments. This study aimed to evaluate the feasibility, signal acquisition performance, and exploratory physiological signal characteristics of a non-invasive multimodal monitoring system (Hayl) in community-based screening settings. Methods: A prospective, cross-sectional, multicenter observational pilot study was conducted across rural and urban screening camps in south India. A total of 281 adult participants were enrolled, including individuals with known T2DM, HTN, and those without known comorbidities, encompassing both symptomatic and asymptomatic subjects. Physiological data were acquired using the Hayl system, which integrates photoplethysmography (PPG) and temperature sensing. Signal acquisition feasibility, waveform quality, and derived signal characteristics were evaluated. Comparative and exploratory analyses were performed across predefined clinical subgroups. The study was conducted under Institutional Ethics Committee approval in accordance with guidelines from the Indian Council of Medical Research. Conclusion: The Hayl system demonstrated high feasibility for physiological signal acquisition, with successful PPG recordings in 274 participants (97.5%) and temperature signals in 279 participants (99.3%). Most recordings exhibited high waveform quality (74.0%), with observable variations in signal characteristics across clinically relevant subgroups. Reduced pulse variability and increased waveform irregularity were more frequently observed in participants with T2DM and HTN, while symptomatic individuals demonstrated greater signal variability compared to asymptomatic participants. Temperature measurements were stable, with a mean peripheral temperature of 33.4 with a variation of 1.2C degrees. These findings support the potential of Hayl as a non-invasive multimodal platform for community-based wellness screening and exploratory signal-based physiological assessment. Further large-scale and longitudinal studies are required to establish clinical utility.

19

Domain-based basal and ambulatory glycemic exposure metrics derived from continuous glucose monitoring: a real-world clinic-based study

Shinde, S. N.; Shinde, R. S.; Bhangaaley, S. Y.

2026-05-26 endocrinology 10.64898/2026.05.24.26353983 medRxiv

Top 0.6%

0.6%

Show abstract

Background: Consensus continuous glucose monitoring (CGM) metrics, including time in range (TIR), time above range (TAR), time below range (TBR), mean glucose, glucose management indicator, and glycemic variability, are essential for modern glucose assessment. However, these whole-day summaries do not explicitly partition nocturnal basal from daytime ambulatory glycemic burden. Objective: To develop and evaluate a complementary domain-based CGM framework that quantifies basal and daytime ambulatory glycemic exposure across oral glucose tolerance test (OGTT)-derived dysglycemia phenotypes. Methods: In this observational, clinic-based study, 253 individuals underwent OGTT with insulin measurement and CGM. Participants were classified using a prespecified OGTT-derived phenotyping algorithm, implemented through a deterministic rules-based web calculator, and collapsed into five groups: NoDM, Increased insulin resistance, Midzone Glycemia, Prediabetes, and Diabetes. CGM files were uniformly reprocessed by selecting the latest contiguous episode and retaining the most recent 15 calendar days with data. The 24-hour profile was partitioned into nocturnal basal (00:00 to <06:00) and daytime ambulatory (06:00 to <24:00) domains. Derived indices included Area of Basal Glycemia (ABG), Area of Prandial/Daytime Ambulatory Glycemia (APG), incremental ABG (iABG), incremental APG (iAPG), and exploratory deficit indices dABG and dAPG. Results: The final dataset contributed 3,647 analyzable CGM days. APG remained higher than ABG across all groups. Mean ABG/APG increased from 80.45/86.38 mg/dL in NoDM to 111.96/124.70 mg/dL in Diabetes. Mean iABG/iAPG increased from 5.65/6.60 to 34.12/38.91 mg/dL, whereas dABG/dAPG declined as dysglycemia worsened. Conclusions: The ABG/APG framework provides interpretable, domain-resolved CGM burden metrics that separate basal from daytime ambulatory exposure and distinguish total burden from above-threshold excess. These indices are proposed as adjunctive metrics to support dysglycemia phenotyping, early risk recognition, and treatment monitoring, but are not intended to replace established consensus CGM metrics or diagnostic criteria. External, prospective validation is required.

20

Composite Certainty: Addressing Metric Degeneracy in Parameter Inference for Model-Based Diagnostics

Koshe, A.; Sobhani Tehrani, E.; Jalaleddini, K.; Motallebzadeh, H.

2026-05-13 bioengineering 10.64898/2026.05.09.724027 medRxiv

Top 0.6%

0.6%

Show abstract

Quantifying the diagnostic dispersion of inferred parameter distributions is a challenge in uncertainty-aware modeling. Scalar summaries such as credible interval width are topology-blind; fundamentally different posterior morphologies can yield identical scores, obscuring whether a parameter is precisely estimated or constrained to a range. We propose a Composite Certainty Framework that addresses this metric degeneracy by aggregating five complementary uncertainty metrics including interquartile range, standard deviation, full width at half maximum, Shannon entropy, and mass width. These metrics are aggregated through non-parametric Borda rank voting into a single, unitless consensus certainty score. Applied to a simulation-based inference pipeline for a finite-element model of the human middle ear tuned to cadaveric acoustic measurements, the framework reveals parameter-specific identifiability profiles invisible to any individual metric. It produces two actionable clinical thresholds: (1) the maximum tolerable measurement noise for reliable parameter recovery, and (2) the minimum simulation budget for posterior convergence. We demonstrated that no single metric captures all aspects of posterior dispersion, as spread-based metrics and entropy diverge systematically for clinically critical parameters, whereas their aggregation produces a consensus reflecting genuine diagnostic certainty. The framework is generalizable to any model-based diagnostic pipeline where posterior distribution not merely its coverage, but determines clinical certainty.